首页> 外文OA文献 >Learning Optimal Policies in Markov Decision Processes with Value Function Discovery

【2h】

Learning Optimal Policies in Markov Decision Processes with Value Function Discovery

机译：通过价值函数发现学习马尔可夫决策过程中的最优策略

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

htmlabstractIn this paper we describe recent progress in our work on\udValue Function Discovery (VFD), a novel method for discovery\udof value functions for Markov Decision Processes (MDPs).\udIn a previous paper we described how VFD discovers algebraic\uddescriptions of value functions (and the corresponding\udpolicies) using ideas from the Evolutionary Algorithm field.\udA special feature of VFD is that the descriptions include the\udmodel parameters of the MDP. We extend that work and\udshow how additional information about the structure of the\udMDP can be included in VFD. This alternative use of VFD\udstill yields near-optimal policies, and is much faster. Besides\udincreased performance and improved run times, this\udapproach illustrates that VFD is not restricted to learning\udvalue functions and can be applied more generally.

机译：htmlabstract在本文中，我们描述了\ udValue函数发现（VFD）工作的最新进展，该函数是用于发现马尔可夫决策过程（MDP）的\ udof值函数的新方法。\ ud在以前的论文中，我们描述了VFD如何发现...值函数（以及相应的\ udpolicies）使用了来自Evolutionary Algorithm领域的思想。\ ud VFD的一个特殊功能是描述包括MDP的\ udmodel参数。我们扩展这项工作，并\ udud显示如何在VFD中包含有关\ udMDP结构的其他信息。 VFD \ udstill的这种替代用法可产生接近最佳的策略，并且速度更快。除了提高性能和缩短运行时间外，这种方法还表明VFD不仅限于学习函数，而且可以更广泛地应用。

著录项

作者
Onderwater, Martijn; Bhulai, Sandjai; Mei, Rob;
展开▼
作者单位

展开▼
年度 2015
总页数
原文格式 PDF
正文语种 en
中图分类

相似文献

外文文献
中文文献
专利

1. Learning Optimal Policies in Markov Decision Processes with Value Function Discovery [J] . Martijn Onderwater, Sandjai Bhulai, Rob van der Mei Performance evaluation review . 2015,第2期

机译：通过价值函数发现学习马尔可夫决策过程中的最优策略
2. On the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of (s, S) Inventory Policies [J] . Feinberg Eugene A., Lewis Mark E. Naval Research Logistics . 2018,第8期

机译：马尔可夫决策过程最优动作的收敛性与（s，S）库存策略的最优性
3. Properties of the optimality equation and optimal policies in discrete time Markov decision processes [J] . Qiying Hu, Wuyi Yue 電子情報通信学会技術研究報告. 回路とシステム. Circuits and Systems . 2002,第427期

机译：离散时间马尔可夫决策过程中最优方程和最优策略的性质
4. Sufficiency of Markov policies for continuous-time Markov decision processes and solutions to Kolmogorov's forward equation for jump Markov processes [C] . Feinberg E.A., Mandava M., Shiryaev A.N. IEEE Annual Conference on Decision and Control . 2013

机译：连续时间马尔可夫决策过程的马尔可夫策略的充分性以及跳跃马尔可夫过程的Kolmogorov正方程的解
5. Optimal learning: Computational procedures for Bayes-adaptive Markov decision processes. [D] . Duff, Michael O'Gordon. 2002

机译：最佳学习：贝叶斯自适应马尔可夫决策过程的计算程序。
6. Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play [O] . Sherif Abdelfattah, Kathryn Kasmarik, Jiankun Hu 2018

机译：通过内在动机的自我博弈在多目标马尔可夫决策过程中发展稳健的政策覆盖范围
7. Regret-optimal policies in absorbing semi-Markov decision processes with multiple constraints(The Development of Information and Decision Processes) [O] . Kadota Yoshinobu, Kurano Masami, Yasuda Masami 2006

机译：吸收具有多个约束的半马尔可夫决策过程的后悔最优策略（信息和决策过程的发展）

Learning Optimal Policies in Markov Decision Processes with Value Function Discovery

摘要

著录项

相似文献

相关主题

期刊订阅